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1 . Introduction 

Censored data problems arise frequently in medical, and 
also in engineering system reliability, applications . For example, 
in medical survivorship studies some subjects may be lost to 
follow-up, or available data may be analyzed before all subjects 
have expired. In the equipment reliability context observed units 
may still be in operation, perhaps after several previous failures, 
at the time of the analysis. Considerable attention has been 
recently devoted to developing informative statistical methods 
for handling data of this type (see Kalbfleisch and Prentice (1980)). 

It is straightforward, though sometimes computationally 
tedious, to deal with censoring in a parametric manner, i.e. by 
assuming a specific form for the lifetime distribution (exponen- 
tial, Weibull , lognormal, or whatever) and then estimating param- 
eters, perhaps by maximum likelihood. The approach adopted here 
is, instead, to begin with the Kaplan-Meier (1958) product-limit 
estimator of survival probability. This estimator is the non- 
parametric maximum likelihood estimator of a distribution function 
from a sample of singly-censored data. Then, since the jackknife 
technique has been shown to be widely useful for obtaining robust 
intervals, cf. Miller (1974), it is applied to the Kaplan-Meier 
estimate in order to obtain approximate confidence intervals for 



the survival probability. It is reasonable to argue that if the 
jackknife is to be valid under complex censoring it must perform 
correctly in this simplest of all situations, and if it does work 
here then it is likely to also work in more complex settings. 
Therefore, in a sense we are reporting on the results of a pilot 
study of an attractive procedure. 

In this paper the effect of jackknifing the Kaplan-Meier 
estimate will be examined both by Monte Carlo simulation (sampling 
experiments) and by asymptotic analysis. In Section 4, we report 
on the results of some extensive Monte Carlo investigations, com- 
paring confidence limits for survival probability obtained via 
jackknife with those from other techniques. It will be seen that 
the jackknife seems to perform well for moderate sample sizes, even 
under some rather unusual conditions. In Section 5, asymptotic 
results are reported that provide theoretical underpinnings for 
the jackknife procedure, at least for large sample sizes. Specifi- 
cally, it is shown that the jackknifed estimate is approximately 
normal with the asymptotically correct variance, and hence produces 
correct confidence limits for the Kaplan-Meier estimate. Taken by 
itself, this result may not be terribly important, because an 
expression for the variance of the estimator is known, and it can 
be estimated by substituting estimates of any unknown functions 
into the expression. However, for doubly censored data (cf. Turn- 
bull (1974)), and for data with censoring and truncation, the situ- 
ation is more complex (cf. Turnbull (1978)). The fact that the jack- 
knife works in the singly censored case makes it more likely that 
it works for these more complex censoring patterns and for others 
as well. 
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It should be noted that the bootstrap procedure, a 
re-sampling approach investigated by Efron (1979) and (1981) is 
also applicable to complex censoring situations, apparently 
giving results in good agreement with Greenwood's formula for 
a particular case investigated. 
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2 . 



Formulation of the Problem^ the Kaplan-Meier Estimate 



Suppose ,X 2 / . . . /X^ are n observed survival times, 
e.g. of medical patients or of equipments subject to failure. 

Some of these observations are of complete lifetimes (failure 
times) but others are not, having been censored by the time of 
observation. For short we refer to complete observations as 
deaths, and censored observations as losses. Censoring simply 
means that a "complete time" is not observed, although a "partial 
time," up to the censoring, is. Censoring complicates the prob- 
lem of estimating the theoretical survival probability to time 
X, denoted by F*^ (x) = l-F^(x). 

Kaplan and Meier (1958) furnish a maximum likelihood 
estimate of F*^ (x) from among the class of admissable distribu- 
tions. This product-limit estimate may be written in several 
equivalent ways, assximing no ties among the observations; 



F^ (x) = 
n 



n 

X. <x 
1 



n-r . 



n-r . +1 
1 



6 . 



(2.1, a) 



n 

n 

i=l 



n-i 

n-i+1 



^6i(x) 



(2.1,b) 



k(x) 



k (x) (-n . 6 
n I 1 1 

i=l 



n . 

1 



(2.1,c) 



In (2.1, a), r^ is the rank of x^ among the ordered observa- 
tions x,,.<x,_. ...<x, ., and 6- is unity if x. is an 

(1) (2) (n) ' -^1 

observed death, being zero othervise. In (2.1,b), 
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I 



0 



1 



otherwise . 



if X 



(i) 



< X and is a time of death 
( uncensored) 



( 2 . 2 ) 



In (2.1,c) n^(=n-(i-l)) represents the number of items exposed 

(to either death or loss) at the i;^ ordered time, and k(x) is 
the total number of deaths by time x. 

A numerical example helps to explain the estimate. Suppose 
the data points are 

1 < 2* < 4 < 5* < 7* < 8 < 10 

where the starred measurements are losses, and the rest deaths. 

Let us estimate the survival probability to or beyond x = 6. Then, 
since n = 7, and k(6) = 2 



Note that by definition (2.2) the estimate jumps down 
following data values that are deaths, does not jump at 
losses, and remains constant between down- jumps. Technically, 




by (2 . 1 ,b) 




by ( 2 . 1 , c) . 



F^(x) is a left-continuous monotonically non-increasing step 
function; this makes F^(x), the estimated distribution of time 



n 



of death, left-continuous as well. 
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3 . Interval Estimates for the Kaplan-Meier Estimate 

For a given set of data the K.-M. estimate provides a 
point estimate of the survival probability. It is, of course, 
desirable to assess the stability of such an estimate under rea- 
sonable assumptions about the origin of the data; specifically it 
is useful to furnish approximate confidence intervals for a sur- 
vival distribution (x) . The jackknife procedure , see Miller 
(1974) and Hosteller and Tukey (1977) , is one way of producing 
such limits. In this section we describe the computation of jack- 
knife limits, and compare the results to confidence limits obtained 
by alternative procedures. Comparisons are made by simulation. 

3.1. The Jackknife Procedure 

The jackknife procedure is well-described in Hosteller 
and Tukey (19 77) , where it is pointed out that a preliminary 
transformation to approximately symmetrize the sampling distri- 
bution of the estimator is beneficial; see also Cressie (1981). 

For this study we have chosen to utilize the classical "inverse 
sine" transformation that tends to stabilize the variance of — and 
also approximately symmetrize — binomial count data. This trans- 
formation is suggested since the number of samples surviving a fixed 
time would be binomial under ideal conditions if there were no 
censoring. Initial experiments with a logistic transformation 
proved to be less satisfactory, as was a simple log transformation; 
in practice, both log and logistic transformations must involve a 
"start," see Tukey [1977], which influences the coverage. A natural 
choice is l/2n, see Cox [1972], but systematic confidence interval 
undercoverage results, empirically suggesting a larger value. Here 
is our procedure. 
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(a) 


Select a 


value of X at 


which to estimate survival 




probability. 




(b) 


Compute 


F^ (x) , e.g. by 


(2.1) . 


(c) 


Compute 


A^(x) = sin"^^ 


/f^ (x) . 
f n 


(d) 


Compute 


T . (x) , the 
n- 1,3 


K.-M. estimate leaving out the 



j th observation, whether it be an observed (recorded) 
death, or a loss. The formula actually used was 



, . (x) 

n- 1 f 1 



• ^ 6 • (x) 

1-1 , . 1 n 

3-1 ' ^ 3=i+i 



(x) 



(3.1) 



(e) Compute A , . (x) = 

n- 1,3 



sin 



V ?'’ , (=0 

/ n- 1 , 3 ' 



(f) Compute the j;Wi pseudovalue: 



V. = nA (x) - (n-l)A , .(x), j=l,2,...,n 

J “ n ± , J 



(g) Find the mean and variance of the pseudovalues: 



V 



1 V ^ 2 1 r , -.2 

= - V . , and ' 



and s = / s 

V / V 



(h) Compute (approximate) two-sided (l-a)*100% confidence 
limits as follows: 

s 

/n 



L = V - ~ ^ / F^ (x) 



< v+t, (n-1) — = U , (3.2) 

1 - 0/2 ^ 
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where %-point of Student's t; then 

invert to obtain (approximate) two-sided (1-a) • 100% 
confidence limits for survival beyond x: 



2-0 2 
sin (L) < F (x) < sin (u) 



(3.3) 



Theoretical justification of such a procedure for large n is 
given in a final section of this paper. The quality of the 
product is illustrated by simulation examples to appear subse- 
quently. 

3.2. Alternatives to the Jackknife: "Greenwood's formula" 

The classical estimate of the variance of the estimate 
F^(x) is given by "Greenwood's formula," see Kaplan and Meier 
(1958) , p. 477, or Thomas and Grunkemeier (1975) , p. 867. Again 
when no ties are present this may be expressed as 



Var 







2 k (x) 



6 , 

1 

. n . (n . -6 . ) 
1=0 11 1 



(3.4) 



It is interesting and reassuring that this approximate formula 
delivers exactly (~)^ as an estimated variance when all 

observed events are deaths. 

It follows that approximate two-sided (1-a) • 100% 
confidence limits may be obtained by this procedure; 

a) Select a value of x at which to estimate survival 



b) 



probability. 

sO 



Compute F^(x), the point estimate of survival 



probability. 



c) Compute s„ = Var 

VJ 



F^ (x) 
n 



from (3.4) . 
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d) Compute approximate two-sided (1-a) • 100% confidence 
limits : 



Lp = F® (x) - z 
b n 



l -“/2 ^ 



= F°{x) + z 
n 



l -“/2 



where ^\-a/2 (l-a/2) • 100 percent point of 

the unit Normal. Then 




(3.5) 



with approximately the quoted confidence. 



For justification of the above procedure, which we will 



call the procedure following Thomas and Grunkemeier (19 75) , 

when n is large refer to Breslow and Crowley (1974) . Simulation 
results appear subsequently. 

3.3. An Approximate Likelihood-Ratio Interval Estimate 



ratio based procedure for obtaining approximate (1-a) *100% 
confidence limits. In outline, the procedure approximately maxi- 
mizes the likelihood of a survival function under a constraint; 
this will be called the procedure . For a similar development 

see Madansky (1965) . Specifically, one maximizes the likelihood 
(5d) of Kaplan and Meier, subject to the constraint that survival 
to time X equals F^; 



Thomas and Grunkemeier (1975) propose use of a likelihood- 



k 




max L 



(P^,X) 



n 



n 



(3.6) 



i=k(x) +1 



'x)+l^‘^i (1-Pi) + (n^-6^)£np^} , 



I 



giving estimates 
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^ + A - 6^ 

^ — ijr^ ' ^ ^ 1 ,2 , . . . ,k (x) ; 

i 



n . - 6 . 

— z: f 1 = k(x)+l,...,n 



and 



k(x) _ 

n p. (A) E F^(x;A) 
i=l ^ 



(3.7) 



(3.8) 



from the constraint condition. Next (numerically) solve the 
equation 



[F°(x) -F°(x;A)]/F°(x;A)/V(A) = ±Zi_^/2 ^3.9) 

for A^ and A^ where F^ is the product-limit estimate of 
survival beyond x, F°(x;A) is given by (3.8), and z 

l-a/2 

is the (l-a/2) 100 th percent point of the unit normal distribu- 

tion. Then, according to Thomas and Grunkemeier (see footnote, 
p. 867) V(A) may be expressed as follows; 



V(A) 



k (x) 

[(n + A)/n] Y 5 (n^+X) (n^+X-S 

i=l 

(3.10) 

[l-F°(x;A) ]/[F^(x;A) n(x)] for F^(x;A)=l 



here n(x) is the number of individuals exposed at 
(approximate) upper and lower confidence limits for 

obtained by substituting A- and A,, into (3.8): 

h U 



X. Finally, 

F^ . are 
(x) 



= 



k(x) 

n 

i=l 



n . +A^-6 . 
1 L 1 



n . + A^ 

1 L 



and 



= 



k(x) 

n 

i=l 



n . + A, - 6 . 
1 y 1 



"i-" 



(3.11) 
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The principle difficulty with application of this 
method is the numerical solution of (3.9) for the roots A, 

Jj 

and A Newton-Raphson method was utilized in the program 

developed for this study. It was only feasible to make exten- 
sive trials of the procedure for sample size n = 25. 
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4 . Simulation Results 



In order to compare the performance of the jackknife 
procedure to the other candidates described above, namely 
and Z^, some of the particular cases treated by Thomas and 
Grunkemeier (1975, p. 168ff.) were simulated, and nominal 95% 
and 90% confidence limits were constructed. We summarize the 
results in the following tables. Note that assessments are made 
of interval performance at three probability-of-survival levels: 
0.75, 0.50, 0.25 for each combination of death and failure 
distributions . 

Examination of the tabulations of confidence limit coverage 
and also the average and standard deviations of c.i. widths sug- 
gest that the jackknife confidence intervals perform in a generally 
conservative manner as compared to the "Greenwood's formula" 
results (Z^) and the approximate likelihood ratio method (^ 2 ) • 
That is, JK tends to over-cover, while Z^ consistently under- 
covers; Z^ has some tendency to under-cover with severe losses 
(Case 1) and for small probabilities of survival but generally 
performs well. Of the three estimating procedures, Z^ is by 
far the most difficult and expensive to carry out. The computer 
time involved in computing Z^ n = 50 prohibited tabulation 

of those results for this study. Note that the tendency of the 
jackknife to over-cover is reduced as the probability of survival 
decreases. Actually abusrdly low values occur for survival proba- 
bilities 0.50 and 0.25 in Case 1; they are a consequence of 
the severe censoring assumed. In general, the results obtained 
indicate that the jackknife procedure is a worthy competitor of 
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Widths 0.086 0.085 0.062 



Case 2: X. (death times) independent unit exponential; Y. (loss times) 
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Case 4: X. (death times) independent unit exponential; Y. (loss times) 



I 

I 

I 



I 

I 

I 



I 

I 

I 



O 

in 






0) 

N 



w 

<u 

rH 



g 



cn 



g 

u 

o 

•H 



-P 

C 

0) 

TJ 

C 

0) 

04 

0) 

C 

•H 



w 

-p 

•H 

g 

•H 

h^l 

<U 

o 

o 

0) 

•H 

M-l 

C 

0 

CJ 

o\o 

in 



fiJ 

O 

*H 

g 

O 

:z 



w 

Q) 

•H 

-P 



(d 

o 

u 

04 



> 

•H 

> 

p 

CO 

cu 

p 

E-i 



CN 

tS]| 



I 

I 

I 



I I 

I I 

I I 



nI 









vn 


00 






o 




4 ^ 


vn 


ro 






in 


isTi 


• 


ro 

• 


o 

• 






o 




o 


o 


o 




cn 














<u 














•H 






r- 


m 


ro 




-P 






m 








•H 




a\ 


ro 


o 




rH 




h) 1 


• 


• 


• 


in 


•rH 






o 


o 


o 


x> 


X 












•H 


(d 












g 


XI 












•H 


0 












PI 


p 






1 


1 


1 




04 




col 


1 

1 


1 

1 


CU 

0 


rH 












c 


cd 












cu 


> 














•rH 












•rH 


> 






ro 


CN 


r- 


MH 


P 


in 






CN 


CN 


c 




r- 

• 


col 


vn 

• 


o 

• 


0 

CJ 


CO 


o 




o 


o 


o 




cu 












o\P 














O 


p 














Eh 








rH 


in 








JK 


1 ^ 




CN 


rH 






1 


CN 

• 


o 

• 


cd 

0 








o 


O 


o 


•H 














g 














0 


























x; 














-p 




















w 


o 


CU 




•H 






0 ) 


o 


tn 












o 


fd 










cd 


rH 


p 


XJ 


• 






p 




cu 


-p 


Q 






cu 


MH 


> 










> 


0 ^ 


0 


•H 


• 






< 




CJ 




CO 











00 


r- 


rH 


o 


j 


r- 


o 


m 


in 

• 


nn 

nI 


00 


ro 

• 


o 

• 


o 




o 


o 


o 



in 


vn 


r- 


rH 


rH 


ro 


C7^ 


ro 


o 


• 


• 


• 


O 


o 


o 



CN 

IS3| 



I 

I 

I 



I I 

I I 

I I 




^1 



CN 


KD 


ro 


00 


rH 


CN 


00 


CN 


o 


• 


• 


• 


o 


O 


o 



00 


O 


rH 


rH 


CN 


CN 


(J^ 


CN 


O 


• 


• 


• 


O 


o 


o 



x: 

-p 













in 


o 


CU 




•rH 


cu 


o 


cn 






cn 


o 


cd 






cd 


rH 


p 


X 


• 


P 




cu 


-p 


Q 


cu 


MH 


> 


TJ 




> 


0 


0 


•H 


• 


C 




CJ 


[2 


CO 



16 



Case 5: X. (death times) independent unit exponential; Y. (loss times) 
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Width 0.037 0.046 0.034 0.043 0.033 0.027 0.189 0.078 0.051 



Case 6: X. (death times) independent unit exponential; Y. (loss times) 
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Case 8: X. (death times) independent unit exponential; Y. (loss times) 
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"Greenwood's formula" under present circumstances, and that it 
performs only a little less effectively than does the approxi- 
mate likelihood-based procedure Z^. The presented jackknife 
technique tends to be conservative. 

In order to supplement the above information, a number of 
additional simulations were made to investigate the effect of 
departure from the random censoring model. Specifically, the 
censoring time, Y^, was allowed to depend probabilistically 
upon the time of death, , for a sequence of experiments. A 
selection of the results obtained are shown next. 

In the above situations, in which X. and Y. are now 

1 1 

contrived to be positively dependent, once again the jackknife 
tends to result in over-coverage--i . e . is conservative, and some- 
times radically so. This is to be contrasted with Greenwood's 
formula results, Z^, which generally under-cover. Here there 
is some indication that the likelihood ratio procedure, Z^t 
has a tendency to under-cover when the survival probability is 
near 0.5. Of course, all results are for rather small sample 
sizes, and refer to exponentially distributed deaths. 
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5 . Summary of Theoretical Developments 



In this section a probability model for random censoring 
is introduced. In terms of this model it will be shown that the 
jackknife produces asymptotically correct confidence limits for the 
survival probability from the Kaplan-Meier estimator. A priori 
one could not be certain whether to systematically delete each 
observation in turn when applying the jackknife or whether to 
delete only the uncensored ones. Our results show that the proper 
method is to delete each observation, censored or uncensored. 

5.1. The Model 

Let X^,X 2 ,...,X^ be independent random variables distrib- 
uted according to cdf (x) , which is continuous with F*^(0) =0. 
In medical applications X? represents the survival time of the 
i th patient, and in engineering reliability it represents the 
time to failure of the i;^! equipment (or the i;Wi time to failure 
of an equipment, when appropriate) . The problem is to estimate F® , 
but unfortunately the X? are not all directly observable. 

Let Y^,Y 2 ,...Y^ be independent random variables, identi- 
cally distributed according to cdf G, the latter being continu- 
ous with G(0) = 0. The observable variables are then 



X. 

1 



min{X^ , Y^} 



and 



6 . 

1 



I{X?5Y^} , 



(5.1) 



where I {A} is the indicator function for event A. The Y. 

1 

variables represent censoring times, and are assumed to be independ- 
ent of the X?. The statistician actually observes the smaller 
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of the two variables, and also knows whether the observation is 
uncensored (a "death") or censored (a "loss"). 

5.2. Cumulative Hazard Function 

The Kaplan-Meier estimator is closely related to the 

sample cumulative hazard function (chf) . The latter is defined 
as 



where 

(1974) 



6 • (x) is defined in 
1 

show that 



= I 

i=l 

( 2 . 2 ) 



6 ^ (x) 
n-i+1 



(5.2) 



In fact Breslow and Crowley 



-Jln [1-F® (x) ] = A^(x) +Op(l/n) 



(5.3) 



and it may be shown that 

dF° (x) 

0 ' 
1-F^ (x) 

0 



A“(x) . 



(5.4) 



the integral of the hazard function A*^(u) = dF^ (u) / [1-F^ (u) ] ; 

both (5.3) and (5.4) justify the name given to A^. 

It is convenient to show that the jackknifed estimator of 

F*^, denoted by F^(x), is asymptotically normal by starting 

with A^. If one shows that A^(x) is asymptotically normally 

distributed then it follows that F*^ (x) is also normal, as is 

n 

true of other sufficiently smooth functions (e.g. arc sine) of 
F^(x) . If, in addition, it is shown that the jackknife variance 
is consistent then the jackknife confidence procedure illustrated 
in Section 4 is justified for large sample sizes. 
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5.3. Asymptotic Normality 



Let A|^_j^(x;i) be the sample chf when the iWi ordered 
observation deleted from the sample. Then 



.0 . . > ^ ( j) 

= jli 



a,j)(x) 

• . n - i + 1 



(5.5) 



The corresponding pseudo-value is 



A^(x;i) = nA°(x) - (n-l)A^ , (x;i) 
n n n -1 



n 6 (i) (x) i -1 (j- 1 ) 

n - i + 1 (n-j) (n-j 



(x) 



n 6 , . . (x) 
Y U) 

• • . 1 n - j + 1 

]=i+l 



(5.6) 



The jackknifed estimator is the average of the pseudo- 
values. From (5.6), 



A 0 , > 

A (x) 
n 



n 

1 V aO/ -X 
= — ) A (x; i) 



n n 

1=1 



. ? 1 ? yi , 1 ? ? «iii 

■ n-i-fl n .£2 jii (n-j)(n-j+l) n .£^ n-j 



= a“(x) -i "l 

n ^ ^ 



n -1 (n-j) ( j- 1) 6 ^ j J (x) 



=1 3 = ] 



6 , 4 ^ (x) 

+T 



(5.7) 



n (n-j) (n-j+ 1 ) 



1 ? 

n .£2 n-j + 1 



(x) 



, 0 , > , n-1 jr. , , 

- A (x) + — — 6, V (x) . 
n n (n) 



Thus the jackknifed estimator and the original estimator 
differ by an asymptotically negligible term. Now it has been 
shown that A^(x) is asymptotically normal with mean A^(x) and 
variance 
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1 

n 



X 



0 



dF^ 

(1-F) (1-F°) 



(5.8) 



(cf. Breslow and Crowley (1974), Theorem 4), and so it follows 
that same asymptotic distribution. 

In order to study the Kaplan-Meier estimator, expand the 
logarithm: 



n 



Jin F^(x) = -A^(x) + ^ ^ 






n 



n 



2 ^ 2 
^ i=l (n-i+1)^ 



(5.9) 



Now jackknife, and observe that the result of jackknifing the 



second and higher order terms in (5.9) lead to expressions which 

are o^{l//n) , and so the jackknifed version of Jin F^(^) has 

the same asymptotic (normal) distribution as -A^(x). Since 

exp [Jin F^(x)] = F^(x), and the exponential function is smooth 

(possesses a power-series expansion) it may be shown that the 

normality of the jackknifed version of Jin F^(x) implies that 

of the jackknifed F*^ (x) . Furthermore, the asymptotic normal 

n 0 2,-^'' 

u ^ • (1-F (x) ^ 

distribution has mean F (x) and variance — 



n 



dF 



0 



(1-F) (1-F 



0 , 



5.4. Consistency of the Sample Variance 

It may be shown that the sample variance of pseudovalues 
converges (a.s.) to the correct population variance, further 
justifying the use of the jackknife for large samples. We merely 
sketch the demonstration; see iVIiller (19 75) for details. Begin 
again by considering the pseudovalues obtained by jackknifing 
the sample cumulative hazard function. From (5.5) the jackknife 
variance estimate is given by 
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nVar [A°(x)] 5 ^ {A^_(x;i) - a"(x) 

T n /n5/.,(x) i-1 (j-l)6,..v^/ 



n ^n5^^^(x) i-1 (j-l)6^^^(x) n 6^jj(x) 

n - j + 1 

3=1 - ' 3=1+1 



n 6 , . « (x) 
y t 3 ) 

jii "-3*1 






n / n6 (x) 



= U (i) .— - y 

n-l . £•, I n - 1 + 1 .£ 



i-1 ( j-1) 6 , (x) 

_ y \ J ) _ 



1=1 



(n-j) (n-j + 1) 



I n- jH-l -— 



n , 6 , . , (x) i-1 6 , . , (x) 

= <n-l) J - .1, (n-j^(n-j : m' 



n=l 



6 , ^ (x) ^ 2 

(r' ' 



• 



Now square and study the individual terms. In particular the 
first sum of squares is 



(n-l) I 
i=l 



n /-6 V (x) 1 2 



(i) 



n-i+1 



n 



= rnzi) y f — n — ) 

'■ n ^ '•n-i+1-' 

1=1 



2 (x) 

n 



a.s. 

-y 



d F 



(1-F) (1-F°) 



(5.11) 



agreeing with the correct value (5.8) multiplied by n. Conse- 
quently the remaining terms must cancel out in the a.s. limit 
in order that the jackknife variance function properly. The 
steps are omitted here; see Miller (1975) for details. Finally, 
the correctness of the jackknife variance for the sample chf extends 
to the Kaplan-Meier estimate by previous arguments. It may also 
be shown that the jackknife works properly on any estimator which 
is a smooth-enough function of F^; in particular the arc-sine, 
log, or logistic transformations may all be jackknifed, which 
justifies the approach taken in Sections 3 and 4. 
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